home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Mac Mania 4
/
MacMania 4.toast
/
/
Multimedia & Desktop
/
gst-torque-demo
/
Torquemada Explained
/
Torquemanual
< prev
next >
Wrap
Text File
|
1992-05-27
|
45KB
|
1,055 lines
A Torquemanual for the Inquisitive
Documenting Torquemada version 1.1.0, © 1992 by Greg Swann
5/27/92
Greg Swann
CompuServe: 70640,1574
USPS: P.O. Box 1724
Andover, MA 01810
Table of Disorganization
0. Introductory Chatter
1. Torquemada Basics
2. String Literals
3. Aliases
4. Case Conversion
5. Untyped Wildcards
6. Typed Wildcards
7. Wildstrings
8. Putting It All Together
9. Bringing It All Back Home
0. Introductory Chatter
This is kind of a funny little Ugly Duckling story. Torquemada the
Inquisitor started life as a fluffy little bit of nothingstuff with a
Mac interface wrapped around it. When we were testing XP8, Garry
Fairbairn (the Argus of Saskatoon) had asked me to come up with a
search and replace engine that he could use to translate American
spellings to those favored by people more recently divorced from the
Crown. I came home one night and spent two minutes writing a little
toy that traded massive quantities of memory for very, very fast
searches. That little core was the basis of the first Torquemada.
Torquemada 1.0.0 worked only on string literals. It sported no
wildcards. It read only only one line at a time. But it was very,
very fast...
We've been through ten versions since then. Along the way, the
search engine has been replaced twice, making it somewhat slower but
a lot less hungry for memory. The read/write code has been replaced
three times. The interface has been (transparently) overhauled. And
somewhere in there, I began to take it seriously... (grin)
The current release, version 1.1.0, can deploy up to 640 search and
replace strings on up to 128 files in a drag and drop batch, and the
searches can employ up to 29 wildthings of various stripes
("wildthings" are Torquemada's multifaceted wildcards; there will be
_much_ more about these in due course). And it is still very, very
fast...
And this is very far from the end of the story. But it is the end for
now, which makes this the appropriate place to fully document the
software as it now stands. Torquemada 2.0 will offer a great host of
new features, but it will not go into development for some while. In
the interim, I didn't want to leave users puzzling over ten sets of
release notes scattered in four separate files.
But: as a warning: I am not very good at this (witness my performance
so far (grin)). My feeling is that writing technical documentation is
something best done by someone other than the programmer, someone who
had to _learn_ the answers to questions, and who had to remember that
process of discovery. I'll do my best to do a full brain-dump, but I
don't promise that the results will be clear to anyone but myself.
Commercial, legal and other pertinent notices:
XP8 is a text file reformatter. It will clean up and make
QuarkXPress-ready Macintosh or DOS text files. Among many other
features, it intelligently reformats paragraphs, converts the DOS or
WordStar character sets to their Mac equivalents, substantially
improves the hyphenation and justification of text, converts quotes
better than any software currently available, and traps for XPress
Tags errors that might otherwise result in missing text or
irreversible document corruption. A semi-inhibited shareware version
of XP8 is available on CompuServe (GO DTPFORUM, Library 5) and other
electronic information services. The full commercial release can be
obtained from Greg Swann at:
P.O. Box 1724
Andover, MA 01810
Licenses are sold per machine, with a single license costing $50;
2-10 licences are $45 each; and 11 or more licenses are $40 each.
Torquemada is freeware in its current release. Torquemada 2.0 will be
a commercial release, and it will be priced in the same range as is
XP8.
Torquemada is delivered "as is", without any warranties, expressed or
implied. It is not warranted to be useful _to_ anyone, _for_
anything, and in no wise am I to be held responsible for any
unfortunate consequences resulting from its use or misuse. And I
_hate_ having to say things like that. I do my best to write useful,
simple, elegant, bug-free solutions to difficult problems. In this
case, I am giving of my labor at no charge at all. If you take it
into your head that I represent your big chance to 'strike it rich,'
you will pay a lot in legal fees to discover that you have
miscalculated. It's sad and sick and stupid that we live in a world
of bloodsuckers, but I _promise_ I will not be leech lunch. So there!
And: to those to whom the above disclaimer does not apply: forgive me
for having to make it. It's _you_ whom I'm working for, for pay or for
free. I appreciate your custom and your support, and I wish we all
could just comb the others out of our hair...
(Hey, it's a real 'personal' software company! (grin))
Some notes at random:
While XP8 is a QuarkXPress-specific program, Torquemada can be used
with any Macintosh software that can read and write files of type
TEXT (plain text files). For that reason, this manual is produced as
a plain text file. I would much prefer to write it as an XPress file,
but that wouldn't be terribly useful to people using PageMaker,
ReadySetGo! or FrameMaker. For ease of use with a variety of text
editors, this file is produced in DOS-like fashion; each line ends
with a carriage return. Ideally, you should read it and print it out
in a monospaced font such as Courier, since in certain places things
are aligned with spaces. And: where a certain keystroke might be
ambiguous, I am naming it in words enclosed in braces; for example,
{space} means the spacebar character.
In the same neighborhood, because I myself am an XPress user, and
because my own uses for Torquemada are focused on Quark, my examples
tend to be intensely Quark-like. This is a side-effect that will have
to be overcome by users of other software. Reasoning by analogy, you
will be able to see how to apply my examples to similar features in
your software of choice.
To make the best use of this software (and of XP8, for that matter),
you need some type of fairly functional text editor/word processor. I
use Word or BBEdit, a freeware programmer's editor. The latter is
highly recommended for our purposes.
As mentioned above, Torquemada owes its origins to some exceptionally
literate caterwauling by Garry Fairbairn. Its further development was
goaded in good measure by some very compelling growling by Shane
Stanley (who has a unique perspective on things due to his insistence
on living upside down in an entirely different day). Along the way, a
number of other users made valid, valuable suggestions, and I'm sure
that will continue to be the case. More than any other piece of
software by me, Torquemada has been influenced by its users, and,
here, at the outset, I want to encourage you to contact me with any
requests or suggestions you have.
The default "Torquemada Prefs" file shipped with this archive pays
homage to the 20 brave souls who lent their diligence and
intelligence to the XP8 beta-testing process. Despite my gentle
ribbing, they have my gratitude and my highest respect.
Mike Arst (inventor of the Kvetchamatic Irregular Expression Parser)
had a profound and lasting impact on this document. Obviously, any
remaining errors or ambiguities are my responsibility; he takes
credit only for the stuff that is clear, grammatically correct, and
spelled properly in languages currently in use by humans. Seriously:
he has a rare talent for inducing a tabula rasa mental state (this
may not be a compliment!), such that he can spot hand-waving or other
logical elisions better than anyone I know. It was he who forced me
to write the (to me tedious) descriptions of what each and every
command means. If you are a software developer, you are well-advised to
pay Mike whatever he asks to critique your documentation. He can be
reached on CompuServe at 70403,1337 or by USPS at:
Mike Arst
2459 Fifth Avenue West
Seattle, WA 98119-2506
Perhaps because of its democratic roots (how odd for a program named
after an autocrat), Torquemada the Inquisitor has sprouted a great
host of nicknames. I tend to call things by (quasi-)acronym, so he's
TQM to me. From the very beginning he was Torque or Tork to others,
and these gave rise to verb forms: I Torque, you were Torking, they
had Torqued.
And, for reference: Tomas de Torquemada was the First Grand
Inquisitor of the Spanish Inquisition. The name was later adopted as
a nom de guerre by the first editor of the Sunday Times of London
Crossword Puzzle, and I probably would not have used it were it not
for that latter association.
1. Torquemada Basics
Torquemada is: 32-bit clean, System 7 compatible, Multi-Finder eager,
Apple Event aware, kind to children and house pets, and safe for use
on your precious hardwood floors.
Torquemada will run on any Mac from the Plus up. It requires 768K of
RAM, and you are cautioned not to set the partition smaller than
this; we're using it all.
The interface consists of two dialog boxes and two help windows, and
all of these are very straightforward in operation. If you can use a
Mac at all, you can use my boy Torque.
The Pyre of Purification dialog box uses 10 point Helvetica. If you
don't have that bitmap loaded, you will need to load it into either
your System file or your Core Fonts suitcase. If you don't, Torquemada
will still work, but the search and replace strings will be hard to
read.
And: when your files are being processed, Torque displays the stylish
and attractive Movado Museum Watch Cursor to let you know that things
are really happening.
TQM reads files of type 'TEXT'. It will also see files with the
creator types 'text', 'MDOS', 'mdos', 'CRLF' and 'crlf'. These latter
types are supported because some third-party products have very odd
ideas about how to treat 'TEXT' files.
Torquemada writes files of type 'TEXT' (with the creator type
'XP84'). These files can be opened from any application that can read
plain text files. They can be processed directly by XP8 or other
utilities of mine. Or they can be imported directly into QuarkXPress,
PageMaker or any other publishing application with a plain- or
smart-ASCII filter.
You write your search and replace strings into the Pyre of
Purification dialog box. Searches go on the left and replaces on the
right. Space is allotted for up to 20 search and replace strings, and
precious little space is left on a nine-inch screen! Torquemada
search and replace strings have an enormous maximum length (254
characters). The normal Macintosh text editing services are available
in the Pyre, along with Select All, Copy, Cut, Paste and Clear. You
can Copy from your text editor and Paste into the Pyre, which is the
fastest way to write search strings.
The Inquisitor reads and writes sets of search and replace strings.
This is valuable: in many cases, you need to perform the same
searches on the same types of files, week in and week out. Once you
have perfected a set of searches, you can save it for reuse. Through
the Load Set button in the Pyre of Purification dialog box,
Torquemada will read set files having the creator type 'PREF'. It
will also 'see' the types 'pref', 'TEXT', 'text', 'CRLF' and 'crlf'.
These latter types are supported so that, if you want, you can write
sets in your word processor. A TQM set consists of 40 plain text
lines in the form of search, replace, search, replace, etc. When you
hit the Save Set button in the dialog, the software will write the
currently active set to a file of type 'PREF'.
If you have one basic set of searches that you frequently need to
run, you can save it under the name "Torquemada Prefs" and store it
in the same folder where TQM resides. When you launch the software
directly, this set will load automatically.
Alternatively, you can launch Torquemada by double-clicking on a set,
by selecting it and doing a Finder Open, or (under System 7) by
dragging the set onto the application's icon or an alias of it.
Torque will launch and the set you selected will be loaded
instead of Torquemada Prefs. For this to work, the set must have the
creator type 'PREF'.
You can select up to 32 sets (of type 'PREF') in this fashion, and
the sets will be processed in alphabetical order on your text files.
This gives you up to 640 search strings per session, surely enough
for anyone. These sets will persist until you either Quit, Load Set,
Save Set or Clear All. After you do any of the latter three, only the
set visible in the Pyre of Purification will be run on your text.
Under System 7, you can select up to 128 text files and drag and drop
them onto the application's icon or an alias of it. At the same time,
you can drag and drop up to 32 sets (of type 'PREF'). If you do not
select any sets, Torquemada Prefs will be run on your batch of files.
If you do select one or more sets, those sets will be run in
alphabetical order on each of your files.
To make this clear:
You can use the standard file interface to Load Sets and Open and
Save files; or,
You can select a number of sets in the finder and use the standard
file interface to have those sets run in alphabetical order on each
file Opened; or,
Under System 7, you can select a number of sets and a number of files
and drag and drop them all on Torque. Each file will be run against
all the sets.
Not for nerds only:
'Alphabetical order' is serious business. The 'natural' order is the
order selected (clicked upon) in the Finder. That's hard to remember,
and it's hard to predict the effects of multiple file selection done
by dragging out a marquee. So: we're sorting to lexical order: A
comes before Z, which comes before a, which comes before z. The files
'Today', 'TODAY', and 'today' would sort to:
TODAY
Today
today
O comes before o, and T comes before t. If you need to know how sets
will sort, just look at an ASCII chart. Better yet, name your sets
with numbers; use two digits (02, not 2), since 10 comes before 2.
Why are we sorting sets? A batch of sets is a 'virtual' set. The
second search in a single set can operate on the results of the first
search. In the same way, the first search in the second set of a
batch can operate on something left behind by the first set. To use
batches of sets reliably, you have to know the exact order in which
the searches will be made. Sorting the sets gives you greater
control.
Why are the file limits set so high? Because I hate to think that
anything I do is _almost_ good enough. Honestly, if ever you find
yourself running 640 searches, examine your methods to see if they
can be made more general. If ever you find yourself running on a
batch of 128 files, query your source to see if they can tighten up
on their end a bit. But: if you really _do_ need awesome quantities
of strings or files, you've got 'em.
Batch-processed files will have the default extension (.TQM) applied
to them automatically. If a source file has already been Torqued, and
if its name is _exactly_ 31 characters in length, the new name would
be identical to the old. In that one case, we are using a slightly
different extension (.TQµ), so as to avoid toasting the source file.
Logic of Beeps: if you run a batch of files, Torquemada will beep at
you, a cheery little, "Hey, Dufus! Wake up!" TQM also beeps at the
end of any _normal_ run that takes longer than 5 seconds.
Nuts and bolts:
Torquemada reads and writes by the buffer-load. At read time, a
buffer is approximately 16K in size, and space is allocated for a
buffer to quadruple in size during processing. Torque can read around
returns _because_ it is working on these enormous buffers. At the
same time, we divide between buffers intelligently, so as to avoid
missing any searches.
At the time that a new buffer is assembled, a check is run against
the end of the buffer. If any of the search strings are present in
the end of the buffer, the buffer is truncated at the start of the
earliest full match. The remainder is prepended to the next buffer.
Each search string is run against the whole buffer _as it is at the
time of the search._ Consequently, later searches can operate on the
results of earlier replaces. This proves useful on a number of
grounds, as we will see in due course. But it is also grounds for a
certain amount of caution.
The buffer truncation mentioned above can only happen with strings
that represent the literal contents of the source file at read time.
So if a later search needs to find something put in by an earlier
replace, that earlier search needs to include everything expected by
the later search. As an example, this could fail ("^p" is a Torquemada
'alias' that denotes the carriage return character):
^pGreg ^p<B>Greg<B>
^p^p<B>Greg<B> ^p^p<BI>Greg<BI>
It could fail because the first search doesn't require Torquemada to
preserve both returns in the same buffer. In a very large file, with
a large number of chances to fail, some few can slip the cracks. The
solution is simple: specify everything you'll need to see in search
strings that reflect the actual contents of the file:
^p^pGreg ^p^p<B>Greg<B>
^p^p<B>Greg<B> ^p^p<BI>Greg<BI>
Torquemada uses 29 characters in low ASCII for its own storage
purposes; all of the various wildthings are flagged with these codes.
In consequence, the software needs to insulate any low ASCII characters
found in the source. Otherwise, they would look like wildthings during
searches. So at read time, Torquemada is converting low ASCII found in
the source to a relatively harmless form, e.g. <\#008>. ASCII 1 though
8, 11, 12, and 14 through 29 are handled this way.
ASCII 10 (the linefeed) is handled with a certain measure of
intelligence. The linefeed is the poor relation among control
characters. It had a great day in the sun on Unix systems, where its
use is analogous to the carriage return on the Mac. On DOS systems,
it is an entirely redundant siamese twin to the carriage return. And
on Macs, the poor little linefeed isn't used at all. So: if
Torquemada finds himself processing a Unix file, he will convert the
linefeeds to carriage returns. Linefeeds in DOS files are thrown away
with malice aforethought.
ASCII 9 and 13 (Tab and Return) are not converted.
ASCII 30 is used by Word to denote non-breaking words, so we're
converting that to <\h>, the XPress Tags-language mnemonic for "don't
hyphenate". And Word uses ASCII 31 for the non-breaking hyphen, so
we're showing that as <\!#45>, XPress Tags for a non-breaking hyphen.
All of these conversions happen before any searches are run, so you
can search for the rare instance of low ASCII by using these codes.
2. String Literals
The simplest use that can be made of Torquemada is to search for
string literals. This means literal text that actually appears in the
source file. For example, you could search for "Elvis" and replace
with "The King". If "Elvis" is present in the file, the replacement
will be made. Torquemada searches are case sensitive, which means
that if "elvis" or "ELVIS" are found in the file, the replacement
will not be made.
String literal searching is common; most applications that have a
search and replace function work only on string literals. Used this
way, TQM offers not much more than they, except that you can search
for 20 (or 640!) string literals at once.
But consider: residents of British Commonwealth nations have a tough
way to go when they try to use text written in the United States. A
massive batch of string literals can be just the ticket.
And: string literals help to establish uniqueness in search strings
that employ wildthings of various stripes. (I'm not sure I really
have to say this, so I'd better.) Uniqueness means taking pains to be
sure that a search finds only those particular strings you want
changed, and not others. For example, our Queen's English-speaking
friends might want to change color and favor to colour and favour.
They might write the literal search:
or our
This is pretty drastically non-unique. It would change the two
desired words, but it would also change the name of this software to
Tourquemada.
Uniqueness is one of those things that can only be learned the hard
way. So if the idea is new to you, spend half a day playing with
copies of files.
3. Aliases
ALIASES—Match special text characters
^T or ^t Tab
^P or ^p Carriage return
^^ Caret
You can't type a tab or a return in a dialog box. The Mac toolbox
filters for these characters, regarding them as commands. Tab moves
you from field to field, while hitting return is the same as clicking
the mouse on the Okay button. This is hardly news.
If you want to search for one of these characters, you have to use an
'alias' of it. Torquemada uses Word's convention for denoting these
aliases: tab is known as ^t or ^T, and return is wanted in five
states under the name ^p or ^P. The caret character is used to flag
all of Torque's many wildthings, so it too should be aliased in
search and replace strings: ^^. The caret is ideal for our purposes,
since it almost never appears in text.
Take note that you _can_ Paste tabs and returns into a dialog box. If
you do, Torquemada will behave as you expect (since the aliases are
just being turned into the literal tab and return characters at
search time). When you Save Set, a clean-up is run that converts any
Pasted tabs or returns into their aliased form.
4. Case Conversion
CASE CONVERSION COMMANDS—Can be used only on the replace side;
accented characters are handled
intelligently
^C or ^c CONVERT TO ALL CAPS
^L or ^l convert to all lower case
^S or ^s Convert to sentence caps
^U or ^u Convert To Upstyle Caps
^D or ^d Convert to Downstyle Caps
^= Cancel all case conversion
Torquemada includes six case conversion commands that are available
on the replace side only. If your source text is typed ALL CAPS, you
can convert it to sentence caps fairly easily. The commands can be
used only on the replace side because they control the output format
of the text found; they are not themselves something that can be
searched for. If you forget and put one in on the search side, TQM
will cheerfully ignore it with no untoward consequences.
But: case conversion by software is far from an exact science. The
commands will get you closer to where you want to be, but you may
have to run additional searches or do manual edits to achieve the
final desired results. This is particularly true of sentence caps and
downstyle caps (first letter of every word capitalized, except for
words of three or fewer letters).
However: Torquemada is pretty smart. If your text is totally toasted,
using these commands will save you a _lot_ of time. And the
Inquisitor is a good global citizen: accented characters are
converted appropriately. Moreover, Torque knows when to say when.
Quark's XPress Tags language is case-sensitive ('k' and 'K' mean two
different things). Only one of XP8's commands is case-sensitive. But
in both cases, we're insulating those commands from case conversion.
Face it: a file that is typed _all_ in the wrong case is the rare
bird. Further on, we'll see how to use these commands in conjunction
with other wildthings to selectively change case. For example, you
could take just the subheads to ALL CAPS, or change any ALL CAPS
headline to Upstyle Caps. For now, we'll just talk about using these
with string literals. This search:
Elvis ^CElvis^=
would result in the word "ELVIS" being written out to the file. In
the replace string, ^C means convert to all caps, and ^= means cease
to convert case. (Of course, a search this simple could be done simply
with literals.) You don't need to use ^= if it is the last thing in
the replace string; Torque will put one there if it doesn't find one.
On the other hand, if you need to terminate case conversion _within_
the replace, then you must explicitly turn it off:
Elvis is still the king ^CElvis^= is still ^Uthe king
would result in the text "ELVIS is still The King". We turned
capitalization off after "ELVIS" to avoid changing the case of "is
still". We didn't need to turn upstyle off, since it will be
terminated automatically at the end of the replace string.
5. Untyped Wildcards
UNTYPED WILDCARDS—Match any one character
^0, ^1, ^2, ^3, ^4, ^5, ^6, ^7, ^8, ^9
Okay, this is where the bullet hits the bone. I've been dancing
around this for quite a while, because I wanted to establish good
ground rules before we got to the more hirsute topics. Well, here we
are...
Torquemada has (count 'em) 10 untyped wildcards. 'Wildcards' means
they are not literal characters, but rather markers that will match
literal characters. 'Untyped' means they will match _any_ character.
In Word, '?' is a wildcard, but it can only be used on the search
side. Torquemada's wildcards can be used on the search side, and can
be used, omitted or resequenced on the replace side. Torquemada's
untyped wildcards are denoted by the caret character followed by
a number from 0 to 9. Each one of them will match any character
and _store_ the character matched. Taking our earlier example:
^0^1^2or ^0^1^2our
will match and change color and favor. Unfortunately, it still lacks
uniqueness. Because these wildcards are totally untyped, the search
string will also match any instance of the letters "or" and the three
characters before it.
Here's a better example. Suppose we have text that looks like this:
1. Blah.
2. Blah blah.
3. Blah. Blah blah.
If we needed to modify this, say to add a Quark 'Indent Here' command
to make sure that blahish turnovers hang on the indent, we could do:
^p^0.{space} ^p^0.{space}<\i>
Surely we could do this with literals, but that would get tiresome.
Instead, we can use our knowledge to carefully control ignorance, and
accomplish in one string what might otherwise take three. Note that
we are isolating to uniqueness the text we hit; without the ^p and
the {space}, '^0.' would also match the ends of sentences.
Perhaps the best use of untyped wildcards is as ballast for the typed
wildcards (to be discussed next). They do what they do alone quite
well, but it's almost too much ignorance safely to be borne.
6. Typed Wildcards
TYPED WILDCARDS—Match any one character of that type
^+ Uppercase character (includes accented characters)
^- Lowercase character (includes accented characters)
^± Character of either case (includes accented characters)
^& Alphanumeric character (letter or number, not space or punctuation)
^% Tabular character (digit, space or punct.; not alphabetical)
^$ Printable character (all characters _except_ space characters)
^! Punctuation character (includes high-ASCII punctuation)
^# Numeric character (digits only)
^_ Space character (space, return, tab, option space)
These are alike unto the untyped wildcards except that they are
strongly typed. A typed wildcard will only match a character of its
type. So, for instance: ^+ will only match uppercase characters, with
all others failing to match. The wildcard ^# will match any digit,
and ^_ will match any space character, so we can go back to blahville
and do a much better job:
^p^#.^_ ^p^#.^_<\i>
This is now fully generalized yet completely unique. If there is a
subtopic such as:
a. Subblah.
it will fail to match. If we need to do something different with
that, we can match it with another, different string.
But: wildcards (typed or untyped) match and store only _one_
character. If you search for:
^#^#
only the second digit will be stored. If you need to match and store
two characters of one type, you can use the typed wildcards to
establish uniqueness, then fill out your team with untyped wildcards.
Suppose we needed to match:
10. Double-digit blah.
This would work:
^p^#^0.^_ ^p^#^0.^_<\i>
Pretty cryptic, not. This says: where you find a return followed by a
digit followed by any other character followed by a period followed
by a space character, soak it all up and spew it back out, appending
an XPress Tags 'Indent Here' command. We've gone from string literals
that were readable but not very useful to _this_, a vitally important
message from space aliens (grin). And it gets worse. In the next
section, we'll discuss an even better - and more cryptic - way of
handling this type of problem.
Not for non-nerds only:
Mike Arst quite correctly pinned me to the mat for not going into
these guys in greater detail. Among the wildthings, these present the
greatest potential for confusion. Untyped wildcards (discussed above)
and wildstrings (discussed below) match _anything_. The typed
wildcards only match characters _of their type__ So: this is a
further elucidation of what "of their type" means.
Torquemada is written in the C programming language, and the idea of
typed wildcards is borrowed, analogically, from the character typing
functions available in the standard C function libraries. Where
appropriate, these commands use the full Macintosh character set,
where the C functions do not, but the idea is basically the same. A
typed wildcard matches the source character _if and only if_ the
character is of that type; ^+ will match _only_ uppercase letters,
not lowercase letters, not space or punctuation characters, not
digits.
Taking them one by one:
^+ Will match and store any one uppercase alphabetical
character. Alphabetical characters include the accented Macintosh
characters (e.g., Á, Å). This will _fail_ to match any character that
is not an uppercase character.
^- Will match and store any one lowercase alphabetical
character. Alphabetical characters include the accented Macintosh
characters (e.g., á, å). This will _fail_ to match any character that
is not an lowercase character.
^± Will match and store any one alphabetical character.
Alphabetical characters include the accented Macintosh characters
(e.g., Á, Å, á, å). This will _fail_ to match any character than is
not an alphabetical character. This wildcard is the logical opposite
of the tabular character wildcard (^%).
^& Will match and store any one alphanumeric character.
Alphanumeric characters are alphabetical characters (including
accented characters) and the ten digits. This wildcard will _fail_ to
match any space or punctuation character, which can make it useful
for establishing uniqueness.
^% Will match and store any one tabular character. A tabular
character is any one of the ten digits, a space character (space,
return, tab, or option space), or a punctuation character (including
Macintosh high-ASCII punctuation such as ° or ‡). This wildcard is
intended primarily for matching the elements that make up a table,
exclusive of the explanatory text. Consequently, it will _fail_ to
match any alphabetical character, which can make is useful for
establishing uniqueness. This wildcard is the logical opposite of the
alphabetical character wildcard (^±).
^$ Will match and store any one printable character. A printable
character is one that makes marks on paper, which means that this
wildcard will match all alphabetical characters, all ten digits, and
all punctuation characters. It will _fail_ to match the space
characters (space, return, tab or option space). This wildcard is the
logical opposite of the space character wildcard (^_).
^! Will match and store any one punctuation character, (including
Macintosh high-ASCII punctuation such as ° or ‡). It will _fail_ to
match alphabetical, numeric, or space characters.
^# Will match and store any one numeric character (the ten
digits). It will _fail_ to match alphabetical, space, or punctuation
characters.
^_ Will match and store any one space character (space, return,
tab, or option space). It will _fail_ to match all alphabetical,
numeric, or punctuation characters. This wildcard is the logical
opposite of the printable character wildcard (^$).
7. Wildstrings
WILDSTRINGS—Match and store any text until full pattern is matched
^*, ^~, ^?, ^@
Now this is the really cool stuff. There are four wildstrings and
each will match and store any text until the pattern defined in the
search string is satisfied. As an example, consider a full case
conversion. You got a file typed in ALL CAPS, and you need it to be
Sentence caps. This will do the job:
^~ ^S^~
The search string says: "match anything, from the start of the buffer
to the end." The replace string says, "spew it all back out with the
case converted".
A more complicated example: the client was thoughtful enough to
provide the keystrokes on disk, but the heads and subheads are all
typed ALL CAPS (there are many such thoughtful clients). To convert
them all to Upstyle (presuming that the heads to be hit are preceded
by double-returns), you could do:
^p^p^*^p ^p^p^U^*^p
Wildstrings are massively general. Except for gross transformations,
you really have to build a lot of uniqueness into the pattern, or
they will bleed all over you. But try this on for size: I wish I
could count the number of times I've gotten files that were typed
like this:
First Fifth
Second Sixth
Third Seventh
Fourth Eighth
In every publishing system known to humanity, this should be typed as
one long column, with fifth following fourth. The helpful clients
type it in two columns with tabs so that you'll know it should be one
link spanning a gutter. Do it the cheap way, with tabs, and surely
you'll have to insert a line between second and third. Have fun
cutting and pasting... Or: run a set like this:
^t^*^p ^p
This will throw away everything in the right column, preserving
everything in the left. Then run:
^p^*^t ^p
This will throw away everything in the left column, preserving the
right.
Then concatenate the two files and you've got the text the way you
need it, in one long column.
How these work: wildstrings will match and store any characters found
in the source until the full pattern is matched. If the pattern is
matched immediately after the call to the wildstring, it will store
zero characters. This is useful. It means you can say:
^p^~^$ ^p^$
This will find all runs of one or more returns (or returns with other
space characters) and compact them down to exactly one return. If the
file has only single returns in it, the wildstring will contain zero
characters each time it is employed.
If the pattern fails to match, the text is passed through unaltered.
A wildstring can contain the whole buffer of text, as shown above.
Searching for a pattern match will continue to the end of the buffer.
In consequence, insufficient uniqueness can result in failures to
find a match along with very slow performance. So: be sure to deploy
these puppies in full cognizance of the actual contents of the file.
Wildstrings are completely untyped, but they can be fairly strongly
'typed' by the characters used before and after them. Needless to say,
string literals provide the strongest typing, but the more strident
typed wildcards also do very well. The string above that illustrates
how to compact runs of returns is a good example: we are saying
"match everything from a return to a printable character". Printable
characters are characters that are _not_ spaces, so the wildstring is
'typed' to contain _only_ space characters. A string like this:
^t^%^~^p ^t<f"e century">^%^~<f$>^p
will find the tabular columns of a table (viz., not the text stubs at
the left margin) and plug in XPress Tags coding to change the font.
A search string can contain more than one wildstring, up to the full
complement of four. At the same time, wildstrings can be resequenced
on the replace side. So, if that Officers and Directors chart comes
with the jobs before the names, you can swap them.
One more (this happened to me this week, and it's happened many times
in the past): pick up last year's financials, lose the right-most
column, kick everything right, and plug in a new column for this
year's figures. Bah! Mondo-beyondo manual labor, even with QuicKeys.
Here's a quick set that does the job on my kind of three-column
table:
<t41>^t<t-3>^t^*^t^~^t^?^p <t41>^t<t-3>^tXX^t^*^t^~^p
The "XX" is there to hold the tab, to give the operator something to
double-click on. Yes, the operator had to type the figures;
intelligent hashing is for next week (grin). But: moving the columns
took minutes instead of hours, and the proofreader only had to read
the new copy, not the whole job.
There is a _ton_ of power in these babies, so do take the time to
master them, The effort will be repaid a hundred-fold (or a hundred
times your money back! (grin)).
8. Putting It All Together
There is a boatload of Torquemada sets travelling with this document,
to illustrate various points discussed. None of the sets here is
very elaborate, but my feeling is (despite what you may surmise from
reading the DTPForum on CompuServe) the elaborate set is the
exception, rather than the rule. Most search and replace jobs are
fairly simple if they can be generalized to their essence.
As an admonition: if this stuff is largely new to you, you are well
advised to take your Torque in small doses. Build the vicious set
that does the worst of the reformatting, then have a look at the
output file. Write a new set to do the finer sifting, then have
another look. There is no shame in having a file named
"file.TQM.TQM.TQM.TQM". The shameful thing would be toasting your
text by trying to do too much at once. My own files tend to be named
"kill.XP8.TQM.XP8.TQM", since I use XP8 to do the gross clean up,
Torque to code for XP8, XP8 to unpack the Torque coding (e.g., I'd
much rather replace with "[n77]" than "<f"univ newswcommpi">M<f$>"),
then Torque, finally, to finesse. What works best is what works
fastest, not the set that wins the Nobel Prize for Cryptic
Communication With Space Aliens (this from the man who came up with
all these cryptic commands!).
Here's a brief discussion of the sets enclosed and what they do:
'Stupefaction' uses nothing but string literals and aliases to recode
Mac-like text into a form that can be used on computers less swift.
Ideally, you should reformat the output in an editor (e.g., Save As
Text Only With Linebreaks from Word). That way, those DOS pigs might
actually be able to open the file.
'PC to Mac PostScript' uses literals and one wildstring to clean up
PostScript files that originate on DOS machines but are being
downloaded from Macs. Torquemada automatically removes the linefeeds
(which are not a problem in any case). This set removes other
characters, common in PC-PostScript files, which _can_ be a problem
(most notably the control-D character which starts and ends many
PC-PostScript files).
'Reformat DOS File (Commented)' does the opposite. It takes files
that originated on DOS systems and reformats them to a fairly
Mac-like form. This set is interesting on several grounds. First, it
illustrates how to comment a TQM set: if the search string is empty,
the replace string is ignored. Consequently, you can embed
explanatory comments in your sets simply by typing them into the
replace side. Since you can run up to 32 sets in a batch, this can be
a swell idea: logically separate the types of searches you're
running, then comment each set for future reference. This file is
also making moderately interesting use of wildstrings. And finally,
the set is using markers to permit intelligent processing of special
cases.
This warrants its own paragraph: recall that a later search can look
for things left behind by an earlier replace. It is entirely possible
to have a problem complicated enough that you cannot fully resolve
all doubts in one search. In a case like that, you can leave markers
behind (viz., "|" or "][", anything that is unlikely to show up in
the text), then operate on them with later searches. In extreme
cases, you might need to drop in two or more markers, then operate
intelligently on the quantity present when you get around to ditching
them.
Or, suppose you get a file in Atari or Commodore ASCII. This seems
unlikely at this late date, but stranger things have happened. The
ASCII in those two systems was swapped: uppercase lived where
lowercase belonged, and vice versa. Presumably this was done for a
reason (to make sorting difficult?), but, whyever, it's a big problem
for you. This set will toggle the case of any file:
^+ |^+
Mark existing caps
^- ^C^-
Existing lower case to ALL CAPS
|^+ ^l^+
Marked caps to lower case
'Sentence Caps' shows the first thrust at a very thorny problem.
Suppose you have a file that was typed ALL CAPS. Not all that common,
but it happens. Getting to Sentence caps is fairly easy, but in
getting there, you will have lost most of the caps on proper nouns.
Get set to run a load of literals, because that's the only way to
catch most of them. But at least one can be captured with a
generalized search, as shown in the second search in this set. This
will convert names in the form of Firstname I. Lastname back to
initial caps.
'XP8 to PageMaker®' takes text processed by XP8 or Saved As XPress
Tags from Quark and puts it into a form that can be used by the Smart
ASCII filter in PageMaker. Aldus has promised a smarter Smart ASCII
filter. If you use PM, you might entreat them to hurry, because there
is a lot of cool stuff that can be done with Torquemada if there is
proper support from the destination application.
'Preserve Left Column' and 'Preserve Right Column' are discussed
above.
'Code Alternating Paragraphs' uses wildstrings to code any file that
comes in the form of:
A
B
A
B
Examples of this type of file: Q&A files, Officers and Directors
tables, phone or store listings, etc. The codes shown are XPress
Tags, but this would work just as well with PageMaker Smart ASCII
tags.
'Code Well-Ordered File' uses wildstrings to code files that come in
the form of:
Head
Sub
Body
Sub
Body
Head
Body
Head
Sub
Body
The presumption is that headlines and subheads are preceded by
multiple returns, which is usually the case with this type of file.
Note that we are using two different body styles, since frequently
you want to omit the paragraph indent for the first paragraph after a
head.
And you can see the point: the judicious use of Torquemada can remove
much of the labor - and certainly the most onerous labor - from a text
processing job. In my own work, I am striving to do everything with
software, with no manual labor at all...
9. Bringing It All Back Home
This is way awesome cool, if I do say so myself (grin).
My little ugly duckling has become a text-processing powerhouse.
Six-hundred-forty searches on 128 files with 29 wildthings yields a
solution to all but the most intractable text-processing problems...
And this is but barely the beginning! Torquemada 2.0 will include a
host of new features, including:
* 50 strings in the Pyre of Purification, split up in five pages. The
editing area will be twice its current width. Existing sets will be
compatible, and you will still be able to load up to 32 sets in a
batch, yielding a total of 1600 strings!
* Whitespace characters will be shown as dots for readability.
* GREP-like regular expression parsing (with the existing wildcards
still supported)
* The size of the buffer will be limited only by allocated RAM
* Control over output file name and creator type
* Plus some other stuff (grin)...
In the interim, enjoy my boy Torquemada as he is today, and please do
not hesitate to contact me with any questions, suggestions or
problems.
Very Best!,
Greg Swann
5/14/92
TORQUEMADA QUICK REFERENCE
ALIASES—Match special text characters
^T or ^t Tab
^P or ^p Carriage return
^^ Caret
UNTYPED WILDCARDS—Match any one character
^0, ^1, ^2, ^3, ^4, ^5, ^6, ^7, ^8, ^9
TYPED WILDCARDS—Match any one character of that type
^+ Uppercase character (includes accented characters)
^- Lowercase character (includes accented characters)
^± Character of either case (includes accented characters)
^& Alphanumeric character (not space or punctuation)
^% Tabular character (digit, space or punct.; not alphabetical)
^$ Printable character (all characters _except_ space characters)
^! Punctuation character (includes high-ASCII punctuation)
^# Numeric character (digits only)
^_ Space character (space, return, tab, option space)
WILDSTRINGS—Match and store any text until full pattern is matched
^*, ^~, ^?, ^@
CASE CONVERSION COMMANDS—Can be used only on the replace side;
accented characters are handled
intelligently
^C or ^c CONVERT TO ALL CAPS
^L or ^l convert to all lower case
^S or ^s Convert to sentence caps
^U or ^u Convert To Upstyle Caps
^D or ^d Convert to Downstyle Caps
^= Cancel all case conversion